Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this problem and propose the participants to design an end-to-end real-time video super-resolution solution for mobile NPUs optimized for low energy consumption. The participants were provided with the REDS training dataset containing video sequences for a 4X video upscaling task. The runtime and power efficiency of all models was evaluated on the powerful MediaTek Dimensity 9000 platform with a dedicated AI processing unit capable of accelerating floating-point and quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 500 FPS rate and 0.2 [Watt / 30 FPS] power consumption. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
基于全注意力的变压器体系结构的强大建模能力通常会导致过度拟合,并且 - 对于自然语言处理任务,导致自动回归变压器解码器中隐式学习的内部语言模型,使外部语言模型的集成变得复杂。在本文中,我们探索了放松的注意力,对注意力的重量进行了简单易于实现的平滑平滑,从编码器。其次,我们表明它自然支持外部语言模型的整合,因为它通过放松解码器中的交叉注意来抑制隐式学习的内部语言模型。我们证明了在几项任务中放松注意力的好处,并与最近的基准方法相结合,并明显改善。具体而言,我们超过了最大的最大公共唇部阅读LRS3基准的26.90%单词错误率的先前最新性能,单词错误率为26.31%,并且我们达到了最佳表现的BLEU分数37.67在IWSLT14(de $ \ rightarrow $ en)的机器翻译任务没有外部语言模型,几乎没有其他模型参数。代码和模型将公开可用。
translated by 谷歌翻译
基于参考的线路上色是计算机视觉中的一项具有挑战性的任务。颜色,纹理和阴影是根据抽象草图渲染的,该草图在很大程度上依赖于草图和参考之间的精确远程依赖模型。桥接跨模式信息并建模远程依赖性的流行技术采用了注意机制。但是,在基于参考的线路颜色化的背景下,几种技术将加剧现有的注意力训练困难,例如,自我监督的培训方案和基于GAN的损失。为了了解训练的不稳定,我们检测到注意力的梯度流并观察到注意力分支之间的梯度冲突。这种现象激发了我们通过在消除冲突阶段的同时保留主导梯度分支来减轻梯度问题。我们提出了一种使用这种训练策略,定格梯度注意(SGA)的新型注意机制,通过较大的边缘和更好的训练稳定性优于基线。与最新的线艺术色彩中的最新模块相比,我们的方法表明,FR \'Echet Inception距离(FID,最高27.21%)和结构相似性指数量度(SSIM,高达25.67%)的显着改善。几个基准。 SGA代码可从https://github.com/kunkun0w0/sga获得。
translated by 谷歌翻译
自我监督学习(SSL)在语音识别方面取得了巨大的成功,而有限的探索已尝试完成其他语音处理任务。由于语音信号包含多方面的信息,包括说话者身份,副语言学,口语内容等,学习所有语音任务的通用表示都具有挑战性。为了解决该问题,我们提出了一个新的预培训模型WAVLM,以解决全堆栈的下游语音任务。 Wavlm共同学习了蒙面的语音预测和预训练。通过这种方式,WAVLM不仅可以通过掩盖的语音预测来保持语音内容建模能力,而且还可以通过语音denoing来提高非ASR任务的潜力。此外,WAVLM还采用封闭式的变压器结构的封闭相对位置偏置,以更好地捕获输入语音的序列排序。我们还将培训数据集从60k小时扩展到94K小时。 WAVLM大型在精湛的基准上实现了最先进的性能,并在其代表性基准上为各种语音处理任务带来了重大改进。代码和预培训模型可在https://aka.ms/wavlm上找到。
translated by 谷歌翻译
POI推荐是旅游信息系统的关键任务。然而,与传统的兴趣点(POI)推荐系统相比,可用数据非常稀疏;大多数旅游访问一些观光点一次,大部分景点都没有来自新游客的登记数据。大多数传统系统基于他们的受欢迎程度,声誉和基于类别的类似价值与用户的偏好等级等级。他们不澄清用户可以在这些斑点中体验的内容,这使得难以满足各种旅游需求。为此,在这项工作中,我们提出了一种推荐游客的机制。我们的机制包括两个组件:一个是一个概率模型,揭示了旅游业的用户行为;另一个是伪评级机制,用于处理POIS建议中的冷启动问题。我们对从Flickr收集的两个数据集进行了广泛的实验。实验结果表明,我们的方法优于推荐表演(精确,召回和F测量)和公平性的最先进的方法。实验结果还验证了所提出的方法的鲁棒性,即,我们的方法可以处理数据稀疏性问题。
translated by 谷歌翻译
作为现代深度学习的重要成分,关注机制,特别是自我关注,在全球相关发现中起着至关重要的作用。但是,在建模全局背景时,手工制作的注意力不可替代?我们的兴趣发现是自我关注并不优于20年前开发的矩阵分解(MD)模型,了解编码长距离依赖性的性能和计算成本。我们将全局上下文问题模拟为低级别恢复问题,并显示其优化算法可以帮助设计全局信息块。然后,本文提出了一系列汉堡包,其中我们采用了优化算法来解决MD,以将输入表示分解为子矩阵并重建低级别嵌入。具有不同MDS的汉堡包可以在小心地应对通过MDS的梯度时,对流行的全球背景模块自我关注进行。在愿景任务中进行综合实验,在那里学习全球范围至关重要,包括语义分割和图像生成,展示了对自我关注及其变体的显着改善。
translated by 谷歌翻译
采样是图形神经网络(GNN)培训的关键操作,有助于降低成本。以前的文献已经通过数学和统计方法探索了改进采样算法。但是,采样算法和硬件之间存在差距。在不考虑硬件的情况下,算法设计人员仅在算法级别优化采样,缺少通过利用硬件功能来促进现有采样算法效率的巨大潜力。在本文中,我们开创了一个为主流采样算法提出的统一编程模型,称为GNNSampler,涵盖了各个类别中采样算法的关键过程。其次,为了利用硬件功能,我们选择数据局部性作为案例研究,并在图中探索节点及其邻居之间的数据位置,以减轻采样中不规则的内存访问。第三,我们在GNNSampler中实现了各种采样算法的局部感知优化,以优化一般的采样过程。最后,我们强调在大图数据集上进行实验,以分析训练时间,准确性和硬件级指标之间的相关性。广泛的实验表明,我们的方法通用到主流采样算法,并有助于大大减少训练时间,尤其是在大规模图中。
translated by 谷歌翻译
最近,基于注意的编码器 - 解码器(AED)模型对多个任务的端到端自动语音识别(ASR)显示了高性能。在此类模型中解决了过度控制,本文介绍了轻松关注的概念,这是一种简单地逐渐注入对训练期间对编码器 - 解码器注意重量的统一分配,其易于用两行代码实现。我们调查轻松关注跨不同AED模型架构和两个突出的ASR任务,华尔街日志(WSJ)和LibRisPeech的影响。我们发现,在用外部语言模型解码时,随着宽松的注意力训练的变压器始终如一地始终如一地遵循标准基线模型。在WSJ中,我们为基于变压器的端到端语音识别设置了一个新的基准,以3.65%的单词错误率,最优于13.1%的相对状态,同时仅引入单个HyperParameter。
translated by 谷歌翻译
Photonic neural networks are brain-inspired information processing technology using photons instead of electrons to perform artificial intelligence (AI) tasks. However, existing architectures are designed for a single task but fail to multiplex different tasks in parallel within a single monolithic system due to the task competition that deteriorates the model performance. This paper proposes a novel optical multi-task learning system by designing multi-wavelength diffractive deep neural networks (D2NNs) with the joint optimization method. By encoding multi-task inputs into multi-wavelength channels, the system can increase the computing throughput and significantly alle-viate the competition to perform multiple tasks in parallel with high accuracy. We design the two-task and four-task D2NNs with two and four spectral channels, respectively, for classifying different inputs from MNIST, FMNIST, KMNIST, and EMNIST databases. The numerical evaluations demonstrate that, under the same network size, mul-ti-wavelength D2NNs achieve significantly higher classification accuracies for multi-task learning than single-wavelength D2NNs. Furthermore, by increasing the network size, the multi-wavelength D2NNs for simultaneously performing multiple tasks achieve comparable classification accuracies with respect to the individual training of multiple single-wavelength D2NNs to perform tasks separately. Our work paves the way for developing the wave-length-division multiplexing technology to achieve high-throughput neuromorphic photonic computing and more general AI systems to perform multiple tasks in parallel.
translated by 谷歌翻译